DOCOBERT SBSOHE 



ED 110 US 

AUTHOR 
TITLE 

POB DATE 
HOTE. 



EDRS PRICE ■ 
DESCBIP23fiS 

IDENTIFIERS 

ABSTRACT 



Tl 001* 739 . 

Hill, Richard K. ' ' 

Minimizing Conteict Effect When Using Multiple Matrix 

Sampling. 

Apr 75 

lOp.; Papier presented at the Annual Heeding of the 
National Council on Measurement in Education 
(Washington, D.C*, Haroh 31-April 2, 15^75) 

MF-$b*76 HC-$n58 PIUS POSTAGE 

♦Bias; %Jtett Sampling; Matrices ; Standardized Tests; 

♦Statistical Analysis; ♦Telling; Testing Problems. 

♦Multiple Matrix Sampling / 



This study is an a priori demon|tration of the 
applicability of multiple matrijx samp^ling ^techniques to the prafctical 
research problem of parameter €?stimation. Thre^ tests were, 
adpinisterea to two *.greparat-» but paxjtllel populations, with one , 
receiving item samples and the fot^er ^eceiyifig full tests. ^Special 
efforts were* made to minimize the con text /effects due to safmpling 
procedur.es* Parameters estimated from matrix sampling statistics 
jcflosely . matched those estimated from full test results, .indicating ' . 
little context ef^fect bias due to the sampling procedur^js/ 
(Author) - / - - . * 




♦ * .^.Documents -acquired by SRIC include 'ma.ny. 3,nfdrmal unpublished ♦ 

♦ materials not available from other sources, IRIC makes every effort ♦ 

♦ to obtain the best copy available, nevertheless," items. of marginal ♦ 

♦ reproducibility are often encountered and this affects the quality ♦ 

♦ of the miprofiphe and hardcopy, reproductions ERIC makes* available -♦ 

♦ via the SRIC Document Reproduction Service (EDRS). EDRS^is not ♦ 

♦ responsible f<^r the quality of the original docu^men*. Reproductions ♦ 

♦ supplied ISy EDRS are the be^st that can be made from the 'original* ♦ 



S-j a _ • \j < ^ 5 - a> 



I 

OO 

o 



* 6 

MINIMIZING CONTEXT EFJ^T WHEN USING .MtJLTIPLE MATRIX SAMPLING ' 

Richard K. Hill 
. California State Depaj4raent of Bdifea'tion , 



05 
CO 

o 
o 



Objectives of the Inquiry > Multiple matrix sampling is rapidly gaining popu- 
larity as a tool for evaluation and research, bUt to date the development of 
the theory has far oat^t;ripped ^the practical i^pplicatlon of it. A review of 
the multiple matrix sampling literature shows that a small minority of the 
studies conducted, to date iJave^been performed a priori ^ Most have been con- 

'cel*ned with verification of the theory for estimating" parameters from matfix ' 

& * 

samples, aud thus have been either post hoc or Monte Carlo studies. 

Although several studies haVe demdnstrated that,' in theory, parameter 
estimation can be accomplished -through matrix sampling, little ev^dejic.e exists 
that the^ approach works in pigactlce. There as .a need to develop and' verify' ' " 
procedures to be used with multiple matrix salhpling which will minimize the 
practical problem, of context effect, ^he^purpose of this paper *is* to propose 
some guidelines for use ^when applying multiple matrix sampling techniques, and 
to empirically test the effectiveness of these' guide J,ines with a variety of 



tefe^t; types. 



Method . Ij\tlp.s experiment, three tests were administered to fourth and fifth 
grade pupils in selected schools throughout New York afid Pennsylvania. .The 
tests were: ^ ' ^ 

1. A 25^item short form of the Otis^-Lennon Mental Ability Test , 

V« — a^MM^^^^W ^^^^HHHB ^ 

Intermediate 11^; level. ^ ^ - ■ ^ 



The author wishes to recognize the c6ope.ration an<5 assistance of the staff 

of the Eastern Regional/ Institute for Educatidrv in conducting the study. 
• « 

* Paper presented at the. National Council on Measurement in Education 
Annual Meeting, Washington, D. C, 1975. 



''^ 2. The 60-itfm intermediate I Science subtest of the Stanford 

* Achievement Test battery. " " , / - ' . , 

♦ p 

.3v A 55-'itej^ 'test, .the Eastern ' Regional Institute for Education 

( ERIE ) Science Process'Test ^ designed by ERIE personnel to * - " 

^ measure objectives of the Science — A Process A pproach ( SAPA ) 
< y ^ ' ' ' •» 

. curriculiam. *1 . . , 

There v;ere several reasons for selecting tHese tests. The Otis-Lennon 

^ • . • " • • " HI., ,mm^ mm n 

and the Science - Achievement 'tests are widely used standardized tests. Previ-' 

4 

jr 

ous testing by ERp showed them to be. of approximately 50 per cent <H.fficulty 
l^vel for fourth; gr^er 6, and^ since fifth graders were to be used in the. 
, study also, the . distribution of scores was expected to be neTgatively skewed. 

• \ . ' - . • . • 

ThQ/ ERIE Science - Process test had been very difficult for fourth grade:?s 
in previoujs* tes?tings (results from the previous year had yielded a^meah of 
12 .M- for the 35 items) and the distribution 'of test scores was expected to be 
severely positively skewed. ' \ . " \ ^ ^ 

In -virtually all matrix sampling studied to date, a major obstaTcle to 
clear interpretation has been that subjects received both the item samples 
and the total test. But in this , study * no sub ject^who 'received 'an item sample 
receive* the tot^l test, «nd vice versa* Two separate but parallel populations 

V ^ 

Were generated by drawing two fourth and two fifth grade classes in each of 

• ' 9 

t 

1^* schools. ' A clasjs in each grade from each school wasl administered the item 
samples, with the other two classes receiving the ftill tests. Thus, no examinee 
was ^contaminated by being tested under .both conditions, and yet parameters 
estimated through- matrix*' sampling 'procedures could be compared to those esti- 
mated by full test administration. 



•The administration of the tests was de^tgned to minimisie the other ma- 
jor difficulty encountered ^n many matrix * sampling Gtudies— violation of ihe 

assiynption of rfd context effect, which is composed lairgely of two factors 

speededness and fatigue. Ii# order to minimize fatigue effects, the students ' . 
who were given the full tests received them over a two day period; the f 
Science Process Tesf was administered" the first morning, the Otis-Lennon 
Aptitude Test that afternoon, and the Science Achievement T est th*e foMowing * 
morning. The times allowed to complete the' fests were ^5, 30 and ^5,tninutes^ 

respectively; Which werb considered ample. The time allowed for stjadents 

, taking, the '.item samples was 20 minutes, which was considered 'to- be a generous ^ 
estimate also. Thus, effects of speededness should have" been minimal. - 

The item samples we;re chosen so that all s^ples 4«rere mutually^ ex- ' 
elusive and exhaustive. Fii^ samples of five items each were chosen from 
Otis-Lennon ; seven ^samples of five items 'each from the Science Process 
Test ; and ten samples of six items "each from the Science Achievement Test . 

The item samples from the tests were, combined so that each possible 
combination of the samples from the\three tests occurred at least twice; 
but no more than three times (there were a total of ^x7xlo3 = 35O possible 
combinations). Booklets then were constructed with the items from the 
Otis-Lennon as item numbers 1-5, the Science ' Process Test . 6-10, and the Science^ 
Achievement ll-.l6^. Each student knew only' that he was receiving a »l6-item 
test; no mention was made of ' the f^ct that his item «ainpl-^ was composed of 
three different tests. ^' . 



Data Sources * Afte^r pretesting te-^sure that the directions were cle^r'^nd 
~th^ time limits were reasonable, the ^testis were administered to fourth and 
fifth grade students in the selected schools. The answer, sheets were returned 
to the author fo^ scoring and analysis. ■ 

Results , total of 602 pupils took the fullv^attery of tests, while 653 
took the item samples* <?For eacl^ of the three tes-ts, the mean and standard 
deviatiorv were calculated. For the ite^ samples, the statistics necessary to 
estimate total te'fet mean and standard* deviation were cal*culate^. These results 
are disjjlayed in Tables 1-3. ^ - . 

• In each case, total test variaijce was estimated two different ways. - The 

" • . ; - ; ' " . 

first estimate was calculated using an equation credited 'to Eord ,(1960)': 



VCX) = M-V(Xi)[l + (M-l)KR2q],^ " (l) 

Where V(X) the estimated total test variance ' * 
M = the number of itemi^^mples 
V(Xi) = the variance^ of .the'^ith item sample 
% KR20 = the value of*KR20> for thf ^th item sample 

The second estimate was calculated using an equation from Hill (1972): . 

V(X)* =M-V(Xi)Jl + (M^l)rKR2l) .; ^ (2) 



Table !✓ 'itom Sample and Total T^st Statistics 
Obtained on the Qtis-Lennon Test 





^ Number 




Estimated Total 


L Tffst Variance 


<* 


of 


Estimated Total 


Lord , 


: Hill 


Item Sample No. ' 


PUT)ilS 


Test Mean 


Estimate 


Estimate- . 




136 


19.^1 


18.36 


17..59 




157' 


V.56 


27.52 , 


?*^.27 > 


5 


127 


16:61 


56.92 


:?^.86 , 


if 


126 ■ 


• ' 15.2if 


21.29 


15.99 * 


5 


- . 127 


j • 16.06 


I8.if3 


16.00* 


Weighted mean 










over item samples 




^ 16.92 


ZhM 


21.9't 



Tgftal Jest Results 



Number of 
Pupils 


Mean 


Variance 


. 602 


.17.0a 


29..O5 



6^ 



•6% 



Table 2. Item Sample and Total Test Statistics ■ 
\- . ^ , Obtained on the' ERXS Science Prgces^ Test 







Number 


' Estimated 'Total 
Test Mean 


— \ * 
Estimate^d Total Test Varianno"'" 




Item Sample No. 


. of 


Lord ' ' , 
• Estimate 


Hill 
Estimate 


* 


"5 

• ' 7 ^ 


' * <*, 

' 88 

■ . 93 ' 
89 ' * 

' 301 
92 

,9i ' 
'99 . 


.17.76 . 
„ • 10.78 •. 

13.35 ' 
■ ' 14.46 
- ' -3/4. 62" 
11.67 • 


1 48.88 ^ . 
38.64 
ir.21 
50.09 
■ 28.13 • 
"\7.65 ' 
• i3.4i 


44.09 

' , 24.68: 
6.98 '. 
\ 46.52 
, • 24.06 ^ 
• 10.24 
-7.85 




Wei^ted meaa 
over item garoples 

* 




13.98 • . \ 


• 29.72 ■' 


21.25 • 


♦ » 













Totfal Test Hesults 



' Number ''of 
Pupils . 


V 

. Mean 


, . .Variance 


602 ' * . 


■ l4.l6 


. ' 20.70 



; ERJC: 



- 7 - 



J'able 3. Item Sample and Tptal Test Strktistics Obtained on the 
* * Stanford Scfieace Achievement Subtest. 




• j 

Item Sample No. 


• Number 
of 
Pupils 


Estimated Total 
Test Mean 


Es.tim^ed ■T<jfetfl Test Variance . 


Estimate 


Hill - 
Estimate 


1 


. 65 






1^*6.01 




136.72* 




69 


= ■ '39.13 




.208.22* 




2db,9g * ■ 




63 ■ 


33U65 ■ 




96-11 




81.28 


h 


61 • 


30.82' 






<* 


141.05 


"■ 5 . 


72 • 


37.22 , , 




1^*9.59 




136.56 , ■ 


■ - ' 6 


-70s 


kl.lk 




.72f.6l 




,57.72 . 


■ °'7 

Y 


62 . 


kk.%9 




69.11 




:)5.25 


. 8 


• 60 • 


• 39*33 




. 107-68 




89.16 - - 


.9 


' . 67 ; 






■ S94^ 


* 


• • .81.71 


io' ' 


6if ■ - 


■38-28 . ■ 




■' ■ 111.70 




103-14 


• Wel^ted <inean, 














over item samples 




* ■ 37.71 • 




120.69" 




: 108.36. 




* ^ 

JPot^r Test Results/: 






5 




Number of 
Pupils 


# 

Mesoi ^ 


Variance 




9 


602 V ■ 


36 M> 


IDS'. 16 



a- 



Is 



fF-ratios computed, to 'test for statistically significant differences,, 
between the: means -of the testing conditionG were .95* .30 a^d ^,40 for the , 
^hre'e^ t^sts, respectively, 'with 1 Bnd 1253 degree^ of_ freedom. (More prgp-* 
^r^jhr^the F-ratios should be calculated with* class as the experimental unit, 
since subj^ctB-^^were not randomly assi^ed to clashes. But since the intfet 

I , ■ ^7"- . ^. 0 ^ • ' *' 

of ±hes^^ calculati^s is to show the high degree of similarity between the 

two sampling results, th^e more eonse^gvative approach of using subjects as - 

the exp€sriment units is* used)* These results show no matrix sampling bias 

for the first, two- tests, and a v^ery slight bias for the third test. » 

^ F-ratios computed. to test for statistically significant differences 

between' the obtained vari^inc'fe and the Lord estimate of the* variance wejre 

1.^5, -1*19 and 1*12 for the three tests resfJectfvely, with.601 and' 652 
% . • . , - 

degrees of-fr€ed<ntf* These results are statistically significant at the f03 

level, two- tailed^ , for iftie first t^^st only. F-^ratios' computed Using the. Hill' 

estimate of tke variance were IJDJ, i:32 and 1*00 for the three tests.* re«pec 

"tiVely* 'These results are .statistically significant Sor.th^ second test*/ 

Discussion and 'conclusion * The results indicate that matrix sampling ^an be 

practically: applied when care- is taken to**minirtize. violation of assumptions. 

The very close, matcnups df -^eans^orr-^e ScjLence ft*o0ess Test and the 
' ' * ^ 

Otis-t^tinon reflect thj-s* The, higher mean obtained from the matrix-sampled 
pupils on the Stanford ma;y^ well emanate from a violation of the assumptions 
'discussed earlier* The Stanford was given" last in afl cases; after !two^ - « 



tests the previous day. -Observajtion indicated that both pupils and teachjers 

were test-weary, and it was not uia^xpected to find Ifower mean scores'* oh this 

test from the total i>est group Had the administration o*f the Stanfotff been 

delayed' far perhaps a week, the re^saUts may have correlated a& well with % 

matrix spipli^g estimates as did the other two tests. 

The results concerning the variances were veny much as sxjiected. • " 
f * * ' . ' 

Previous -studies had 'produced results which indicated that variances could be 

estimated well by matrix, sampling. Both the. Lord 'and Hill estimates were- 

effective in* estimating total-test vai^xance in* two of the thret cases, , 



